{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Building Meta-SGD from Scratch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the last section we saw how Meta-SGD works. We saw how Meta-SGD obtains a better and robust model parameter $\\theta$ that is generalizable across tasks along with optimal learning rate and update direction. \n",
    "\n",
    "\n",
    "Now we will better understand Meta-SGD by coding them from scratch. For better understanding, we consider a simple binary classification task. We randomly generate our input data and we train them with a simple single layer neural network and try to find the optimal parameter $\\theta$. \n",
    "\n",
    "Now we will step by step how exactly we are doing this,\n",
    "\n",
    "First we import all the necessary libraries,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate Data Points"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we define a function called sample_points for generating our input (x,y) pairs. It takes the parameter k as an input which implies number of (x,y) pairs we want to sample. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def sample_points(k):\n",
    "    x = np.random.rand(k,50)\n",
    "    y = np.random.choice([0, 1], size=k, p=[.5, .5]).reshape([-1,1])\n",
    "    return x,y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above function returns output as follows, "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[5.01913307e-01 1.01874941e-01 7.16678998e-01 3.90294047e-01\n",
      " 2.95330904e-01 8.66751993e-01 5.09988127e-01 8.59389493e-01\n",
      " 5.16202142e-01 7.92016358e-01 8.24237307e-01 7.76739141e-01\n",
      " 8.57034917e-01 2.75862141e-01 6.44874856e-01 2.75248940e-01\n",
      " 5.67665047e-01 9.61564994e-01 7.58931873e-01 1.08989614e-02\n",
      " 7.69325529e-01 4.05955016e-01 1.98799935e-01 9.94134622e-01\n",
      " 3.07179216e-01 1.34756367e-01 2.92326855e-01 5.00026528e-01\n",
      " 7.23673231e-01 5.28698231e-01 1.52495715e-01 9.20139339e-01\n",
      " 1.76127500e-02 2.42244262e-01 7.09515862e-01 7.10358091e-01\n",
      " 6.47656449e-01 5.15623266e-01 8.77002211e-01 4.18744855e-01\n",
      " 9.67902538e-01 8.79261670e-01 5.88524781e-01 5.11397703e-02\n",
      " 7.07513737e-01 4.61998029e-01 8.77306226e-01 5.32049083e-01\n",
      " 8.07178697e-01 5.01521846e-04]\n",
      "[1]\n"
     ]
    }
   ],
   "source": [
    "x, y = sample_points(10)\n",
    "print x[0]\n",
    "print y[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Single Layer Neural Network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For simplicity and better understand, we use a neural network with only single layer for predicting the output. i.e,\n",
    "\n",
    "a = np.matmul(X, theta)\n",
    "\n",
    "YHat = sigmoid(a)\n",
    "\n",
    "\n",
    "\n",
    "__*So, we use Meta-SGD for finding this optimal parameter value $\\theta$ and also optimal learning rate and update direction that is generalizable across tasks. So that \n",
    "for a new task, we can learn from a few data points in a lesser time by taking very less gradient steps.*__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Meta-SGD"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we define a class called MetaSGD where we implement the Meta-SGD algorithm. In the \\__init__  method we will initialize all the necessary variables. Then we define our sigmoid activation function. Followed by we define our train function. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can check the comments written above each line of code for understanding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class MetaSGD(object):\n",
    "    def __init__(self):\n",
    "        \n",
    "        #initialize number of tasks i.e number of tasks we need in each batch of tasks\n",
    "        self.num_tasks = 2\n",
    "        \n",
    "        #number of samples i.e number of shots  -number of data points (k) we need to have in each task\n",
    "        self.num_samples = 10\n",
    "\n",
    "        #number of epochs i.e training iterations\n",
    "        self.epochs = 10000\n",
    "        \n",
    "        #hyperparameter for the inner loop (inner gradient update)\n",
    "        self.alpha = 0.0001\n",
    "        \n",
    "        #hyperparameter for the outer loop (outer gradient update) i.e meta optimization\n",
    "        self.beta = 0.0001\n",
    "       \n",
    "        #randomly initialize our model parameter theta\n",
    "        self.theta = np.random.normal(size=50).reshape(50, 1)\n",
    "         \n",
    "        #randomly initialize alpha with same shape as theta\n",
    "        self.alpha = np.random.normal(size=50).reshape(50, 1)\n",
    "      \n",
    "    #define our sigmoid activation function  \n",
    "    def sigmoid(self,a):\n",
    "        return 1.0 / (1 + np.exp(-a))\n",
    "    \n",
    "    \n",
    "    #now let us get to the interesting part i.e training :P\n",
    "    def train(self):\n",
    "        \n",
    "        #for the number of epochs,\n",
    "        for e in range(self.epochs):        \n",
    "            \n",
    "            self.theta_ = []\n",
    "            \n",
    "            #for task i in batch of tasks\n",
    "            for i in range(self.num_tasks):\n",
    "               \n",
    "                #sample k data points and prepare our train set\n",
    "                XTrain, YTrain = sample_points(self.num_samples)\n",
    "                \n",
    "                a = np.matmul(XTrain, self.theta)\n",
    "\n",
    "                YHat = self.sigmoid(a)\n",
    "\n",
    "                #since we are performing classification, we use cross entropy loss as our loss function\n",
    "                loss = ((np.matmul(-YTrain.T, np.log(YHat)) - np.matmul((1 -YTrain.T), np.log(1 - YHat)))/self.num_samples)[0][0]\n",
    "                \n",
    "                #minimize the loss by calculating gradients\n",
    "                gradient = np.matmul(XTrain.T, (YHat - YTrain)) / self.num_samples\n",
    "\n",
    "                #update the gradients and find the optimal parameter theta' for each of tasks\n",
    "                self.theta_.append(self.theta - (np.multiply(self.alpha,gradient)))\n",
    "                \n",
    "     \n",
    "            #initialize meta gradients\n",
    "            meta_gradient = np.zeros(self.theta.shape)\n",
    "                        \n",
    "            for i in range(self.num_tasks):\n",
    "            \n",
    "                #sample k data points and prepare our test set for meta training\n",
    "                XTest, YTest = sample_points(10)\n",
    "\n",
    "                #predict the value of y\n",
    "                a = np.matmul(XTest, self.theta_[i])\n",
    "                \n",
    "                YPred = self.sigmoid(a)\n",
    "                           \n",
    "                #compute meta gradients\n",
    "                meta_gradient += np.matmul(XTest.T, (YPred - YTest)) / self.num_samples\n",
    "\n",
    "            #update our randomly initialized model parameter theta with the meta gradients\n",
    "            self.theta = self.theta-self.beta*meta_gradient/self.num_tasks\n",
    "                       \n",
    "            #update our randomly initialized hyperparameter alpha with the meta gradients\n",
    "            self.alpha = self.alpha-self.beta*meta_gradient/self.num_tasks\n",
    "                                       \n",
    "            if e%1000==0:\n",
    "                print \"Epoch {}: Loss {}\\n\".format(e,loss)             \n",
    "                print 'Updated Model Parameter Theta\\n'\n",
    "                print 'Sampling Next Batch of Tasks \\n'\n",
    "                print '---------------------------------\\n'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model = MetaSGD()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 0: Loss 2.22523195333\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 1000: Loss 0.951785305709\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 2000: Loss 1.47382270343\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 3000: Loss 1.07296354822\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 4000: Loss 1.10527536687\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 5000: Loss 1.51087197814\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 6000: Loss 0.894894377588\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 7000: Loss 1.58948948281\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 8000: Loss 1.8580723959\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 9000: Loss 0.950560331048\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n"
     ]
    }
   ],
   "source": [
    "model.train()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
